# Designing and Analysis of 5-bit CSL, CPL & DPL Multipliers

Surender Kumar\*, Ashwani Kumar Singla\*\* and Lalit Garg\*\*\*

\*Guru Gobind Singh Polytechnic College, Talwandi Sabo, (PB), India \*\*G.T.B. Khalsa Institute of Engineering & Technology,

Chhapianwali (Malout), (PB), India

\*\*\*Yadavindra College of Engineering, Guru Kashi Campus, Talwandi Sabo, (PB) India

(Recieved 23 March 2012 Accepted 15 April 2012)

ABSTRACT : Multiplication is a heavily used arithmetic operation that figures prominently in signal processing and scientific applications. Multiplication is hardware intensive, and the main criteria of interest are higher speed, low power dissipation and less area. The objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. In this paper a new method is proposed to reduce power and area of the array multiplier. Recently reported logic style comparisons based on full-adder circuits claimed complementary pass transistor logic (CPL) to be much more power-efficient than complementary CMOS. However, new comparisons performed on more efficient CMOS circuit realizations and a wider range of different logic cells, as well as the use of realistic circuit arrangements demonstrate CMOS to be superior to CPL in most cases with respect to speed, area, power dissipation, and power-delay products. The most important and widely accepted metrics for measuring the quality of multiplier designs propagation delay, power dissipation and area. In this paper a new method is proposed to reduce power and area of the 5-bit multipliers by using different logic design styles.

Keywords: Multiplier, CMOS Logic Design Style

## **1. INTRODUCTION**

Multiplication is one of the basic arithmetic operations. Most advanced digital systems today incorporate a parallel multiplication unit to carry out high-speed mathematical operations. In many situations, the multiplier lies directly in the critical-path, resulting in an extremely high demand on its speed. In the past, considerable efforts were put into designing multipliers with higher speed and throughput, which resulted in fast multipliers which can operate with low delay time. However, with the increasing importance of the power issue due to the portability and reliability concerns of electronic devices, recent work has started to look into circuit design techniques that will lower the power dissipation of multipliers.

Power dissipation is the most critical parameter for portability & mobility and it is classified in to dynamic and static power dissipation. Dynamic power dissipation occurs when the circuit is operational, while static power dissipation becomes an issue when the circuit is inactive or is in a power-down mode. There are three which are summarized in equation (1) [1]:

$$P_{avg} = P_{switching} + P_{short - circuit} + P_{leakage}$$
  
=  $(\alpha_{0 \rightarrow 1} \times C_L \times V_{dd}^2 \times f_{dk}) + (Isc \times V_{dd}) + (I_{leakage} \times V_{dd}) + (I_{leakage} \times V_{dd}) \dots (1)$ 

The first term represents the switching component of power, where  $C_L$  is the load capacitance,  $f_{clk}$  is the clock frequency and  $\alpha$  is the probability that a power consuming transition occurs (the activity factor). The second term is

due to the direct-path short circuit current,  $I_{sc}$ , which arises when both the NMOS and PMOS transistors are simultaneously active, conducting current directly from supply to ground. Finally, leakage current,  $I_{leakage}$  which can arise from substrate injection and sub-threshold effects, is primarily determined by fabrication technology considerations.

The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage. Therefore, reduction of  $V_{dd}$  emerges as a very effective means of limiting the power consumption. However, the saving in power dissipation comes at a significant cost in terms of increased circuit delay. Since the exact analysis of propagation delay is quite complex, a simple first order derivation [2] can be used to show the relation between power supply and delay time

$$T_D \propto \frac{C_L V_{DD}}{K (V_{DD} - V_{TH})^{\alpha}}$$
 ...(2)

- K Transistor's aspect ratio (W/L)
- $V_{TH}$  Transistor threshold voltage
  - $\alpha$  Velocity saturation index which varies between 1 and 2.

Unfortunately, reducing the supply voltage reduces power, but when the supply voltage is near to threshold voltage (from equation 2), the delay increases drastically [3].

## II. LOGIC DESIGN STYLES

Bisdounis *et al.* has proposed a large number of CMOS logic design styles [4]. For multiplication, adder is used as a basic element. For arithmetic applications, following three different logic styles are used for a full adder design to achieve best performance results for multiplier design [5].

## A. Complementary Pass Transistor Logic-CPL

The main concept behind CPL is the use of only an n-MOSFET network for the implementation of logic functions. This results in low input capacitance and high speed operation. The schematic diagram of the CPL full adder circuit is shown in Figure1. Because the high voltage level of the pass-transistor outputs is lower than the supply voltage level by the threshold voltage of the pass transistors, the signals have to be amplified by using CMOS inverters at the outputs. CPL circuits consume less power than conventional static circuits because the logic swing of the pass transistor outputs is smaller than the supply voltage level. The switching power dissipated from charging or discharging the pass transistor outputs is given by:

$$PD = V_{DD} V_{\text{swing}} C_{\text{node}} f, \qquad \dots (3)$$

where  $V_{swing} = V_{DD} - V_{THn}$ . In the case of conventional static CMOS circuits the voltage swing at the output nodes is equal to the supply voltage, resulting in higher power dissipation. To minimize the static current due to the incomplete turn-off of the MOSFET in the output inverters, a weak MOSFET feedback device can also be added in the CPL circuits of Fig.1, in order to pull the pass-transistor outputs to full supply voltage level. However, this will increase the output node capacitance, leading to higher switching power dissipation and higher propagation delay.



Fig. 1. CPL Logic Full adder.

#### B. Double Pass Transistor Logic-DPL

The Double Pass-transistor Logic is a modified version of CPL that meets the requirement of reduced supply voltage designs. The circuit diagram of the DPL full adder is given in Fig. 2. The DPL also has complimentary inputs and outputs, and thus it is implemented by using dual-rails. The main difference between CPL and DPL is that in DPL circuits, full voltage swing is achieved by Adding a pMOS. Hence the problems of noise margin and speed degradation at reduced supply voltages, which are caused in CPL circuits due to the reduced high voltage level, are avoided.



Fig. 2. DPL Logic Full adder.

The basic difference of pass-transistor logic compared to the CMOS logic style is that the source side of the logic transistor networks is connected to some input signals instead of the power lines. The advantage is that one passtransistor network (either nMOS or pMOS) is sufficient to perform the logic operation, which results in a smaller number of transistors and smaller input loads, especially when NMOS networks are used. However, the threshold voltage drop  $(V_{out} = V_{dd} - V_m)$  through the NMOS transistors while passing logic "1" makes swing (or level) restoration at the gate outputs necessary in order to avoid static currents at the subsequent output inverters or logic gates.

#### **III. PARALLEL MULTIPLIER**

1

A serial multiplier consumes less power but due to ripple, delay will be more. In parallel multiplier delay is less but high complex circuitry it consumes more power. Consider the multiplication of two unsigned n-bit numbers, where  $X = x_{n-1}, x_{n-2}, ..., x$  0 is the multiplicand and  $Y = y_{n-1}, y_{n-2}, ..., -0$  is the multiplier. The product of two bits can be written as [6], [7], [8].

$$\mathbf{P} = \sum_{i=0}^{n-1} X_i \sum_{j=0}^{n-1} J 2^{(i+j)}$$

where

$$X = \sum_{i=0}^{n-1} X_i 2^j$$
 .....Multiplicand
$$Y = \sum_{i=0}^{n-1} YJ 2^j$$
 ....Multiplier

|    |      |      | A4   | A3   | A2   | A1   | A0   |     |    |
|----|------|------|------|------|------|------|------|-----|----|
|    |      |      | B4   | B3   | B2   | B1   | B0   |     |    |
|    |      |      | A4B0 | A3B0 | A2B0 | A1B0 | A0B0 |     |    |
|    |      | A4B2 | A3B2 | A2B2 | A1B2 | A0B2 |      |     |    |
|    |      | A4B3 | A3B3 | A2B3 | A0B3 |      |      |     |    |
|    | A4B4 | A3B4 | A2B4 | A1B4 | A0B4 |      |      |     |    |
| P9 | P8   | P7   | P6   | P 5  | P4   | P 3  | P2   | P 1 | P0 |
|    |      |      |      |      |      |      |      |     |    |

#### A. Array Multiplier

An array multiplier is very regular in structure as shown in fig. 3. It uses short wires that go from one full adder to adjacent full adders horizontally, vertically or diagonally [9]. An  $n \times n$  array of AND gates can compute all the  $a_ib_i$  terms simultaneously. The terms are summed by an array of 'n [n - 2]' full adders and 'n' half adders. The shifting of partial products for their proper alignment is performed by simple routing and does not require any logic.

The number of rows in array multiplier denotes length of the multiplier and width of each row denotes width of multiplicand. The output of each row of adders acts as input to the next row of adders. Each row of full adders or 3 : 2 compressors adds a partial product to the partial sum, generating a new partial sum and a sequence of carries.



Fig. 3. 5-bit Array Multiplier (AM).

The delay associated with the array multiplier is the time taken by the signals to propagate through the AND gates and adders that form the multiplication array. Delay of an array multiplier depends only upon the depth of the array not on the partial product width. The delay of the array multiplier is given by [10]:

$$T(critical) = [(N-1) + (N-2)] * T(Carry) + (N-1) * T(Sum) + T(AND) \qquad \dots (4)$$

where T(Carry) is the propagation delay between input and output carry, T(Sum) is the delay between the input carry and sum bit of the full adder, T(AND) is the delay of AND gate, N is the length of multiplier operand.

The advantage of array multiplier is its regular structure. Thus it is easy to layout and has small size. In VLSI designs, the regular structures can be tiled over one another. This reduces the risk of mistakes and also reduces layout design time. This regular layout is widely used in VLSI math coprocessors and DSP chips [11].

## B. Tree Multiplier

C. S. Wallace suggested a fast technique to perform multiplication in 1964 [12]. The amount of hardware required to perform this style of multiplication is large but the delay is near optimal.



Fig. 4. 5-bit Tree Multiplier (TM).

| Sl. No.Multiplier<br>Type | Design<br>Technique | Power<br>Dissipation (nW) | Worst Case<br>Propagation | No. of<br>Transistors | Power Delay<br>Product (m-nJ) |  |
|---------------------------|---------------------|---------------------------|---------------------------|-----------------------|-------------------------------|--|
| Туре                      | rechnique           |                           | Delay (ns)                | Transistors           | r louuet (III-IIJ)            |  |
| 1. Array                  | CSL                 | 1.21                      | 0.91                      | 715                   | 1.10                          |  |
|                           | CPL                 | 3.40                      | 0.91                      | 6.30                  | 3.09                          |  |
|                           | DPL                 | 17.7                      | 1.36                      | 870                   | 24.07                         |  |
| 2. Tree                   | CSL                 | 1.20                      | 0.91                      | 710                   | 1.09                          |  |
|                           | CPL                 | 3.42                      | 0.91                      | 630                   | 3.11                          |  |
|                           | DPL                 | 17.8                      | 1.36                      | 870                   | 24.20                         |  |

Table 1. Performance parameters of 5-bit multipliers

The delay is proportional to log (N) for column compression multipliers where N is the word length. This architecture is used where speed is the main concern not the layout regularity.

This class of multipliers is based on reduction tree in which different schemes of compression of partial product bits can be implemented. In tree multiplier partial-sum adders are arranged in a treelike fashion, reducing both the critical path and the number of adders needed as shown in the figure 4.

The partial products or multiples are generated simultaneously by using a collection of AND Gates. The multiples are added in combinational partial products reduction tree using carry save adders, which reduces them to two operands for the final addition. The results from CSA are in redundant form. Finally, the redundant result is converted into standard binary output at the bottom by the use of CPA [9].

# IV. PERFORMANCE PARAMETERS AND SIMULATION SET-UP

The 5-bit multipliers are compared based on the performance parameters like propagation delay, number of transistors and power dissipation. To achieve better performance, the circuits are designed using CMOS process by MOSIS in 0.35mm technology. The channel width of the transistors is 2.8 m for the NMOS and 7.6 mm for the PMOS. The output capacitance is considered 10fF in all cases whereas the operating frequency is 10 GHz. All the circuits have been designed using TANNER EDA [13]. The power estimation is a difficult task because of its dependency on various parameters and has received a lot of attention [14]. The delay was calculated for the worst case pattern  $11111 \times 11111$ . Direct Simulation method is used in order to analyse the results [15]. The comparative results for two different 5-bit multipliers for different logic design styles are given in Table-1.



Fig. 5. Power Dissipation vs Propagation Delay of AM.



Fig. 6. Power Dissipation vs Propagation Delay of TM.

The relationships between power and delay performance parameters of 5-bit Array Multiplier and Tree Multiplier architectures are shown in Fig. 5 and Fig. 6 respectively.

## **V. DISCUSSION AND CONCLUTION**

It has been observed that complementary pass transistor (CPL) logic design style exhibit better characteristics (speed and area) as compared to other design styles.

So, CPL logic style can be used where portability and high speed is the prime aim. Where, CSL consumes the lowest power among the three. But, the CPL logic design style has propagation delay comparable to DPL and CSL logic design style, so CPL can be considered best logic design style with respect to all parameters of 5-bit multiplier architectures as shown in Table 1.

#### REFERENCES

- N.Weste and K. Eshragian, "Principles of CMOS VLSI Design: A Systems Perspective", Pearson Addison-Wesley Publishers (2005).
- [2] A. Bellaouar, and M. Elmasry, "Low-Power Digital VLSI Design: Circuits and Systems", Boston, Massachusetts: Kluwer Academic Publishers (1995).
- [3] S. Sun, and P. Tsui, "Limitation of CMOS supply-voltage scaling by MOSFET threshold voltage", *IEEE Journal of Solid-State Circuits*, Vol. 30, pp. 947-949(1995).
- [4] L. Bisdounis, D. Gouvetas and O. Koufopavlou, "A comparative study of CMOS circuit design styles for lowpower high-speed VLSI circuits", *Int. J. of Electronics*, Vol. 84, (6): 599-613(1998).
- [5] Anu Gupta, "Design Explorations of VLSI Arithmetic Circuits", Ph.D. Thesis, BITS, Pilani, India (2003).
- [6] Jan M. Rabaey, Anantha Chandrakasan and Borivoje Nikolic, "Digital Integrated Circuits- A design Perspective", PHI, Second edition (2004).
- [7] Neil H.E.Weste, David Harris and Ayan Banerjee, "CMOS VLSI Design-A Circuits and System Perspective", Pearson Education, Third edition (2009).
- [8] V. H. Hamacher, Z. G. Vranesic and S. G. Zaky, "Computer Organization", McGraw-Hill (1990).
- [9] B Parhami, "Computer Arithmetic–Algorithms and Hardware Designs", Oxford University Press (2000).
- [10] Jan M Rabaey, Anantha Chandrakasan and Borivoje Nikolic, "Digital Integrated Circuits", PHI Publishers, Second Edition (2003).

## 28

## Kumar, Singla and Garg

- [11] Frederick A Ware *et al.*,"64 Bit Monolithic Floating Point Processors", *IEEE Journal of Solid-State Circuits*, **17**(5): (1982).
- [12] C S Wallace, "A Suggestion for a Fast Multiplier", IEEE Transactions on Electronic Computers, EC-13, pp. 14-17(1964).
- [13] Tanner EDA Inc. 1988, User's Manual, 2005.
- [14] Najm, F., "A survey of power estimation techniques in VLSI circuits", *IEEE Transactions on VLSI Systems*, Vol. 2, pp. 446-455(1995).
- [15] Kang, S.,"Accurate simulation of power dissipation in VLSI circuits", *IEEE Journal of Solid-State Circuits*, Vol. 21, pp. 889-891(1986).